I
linked yesterday to a Washington Post article on a survey claiming over 600,000 "excess" deaths in Iraq since the invasion, a claim that on face seems ridiculously absurd. The group's website publishes the actual
survey. I read through it to try to understand how this claim was derived. I'm particularly interested in the methodology, a methodology described by the Post as "scientific" and "tried and true."
The methodology is described in the Methods section, and I must say I am unimpressed. The basic idea is pretty simple: visit a number of randomly selected homes and ask how many people died before the invasion and how many after. From this you can derive an estimate of mortality rates. But how this was implemented has some curiosities.
First, the authors make a crucial distinction. They define a household as "a unit that ate together, and had a separate entrance from the street or a separate apartment entrance." First of all, this doesn't seem to even make sense as a definition. It's a unit (of what?) that ate together (implying it's a unit of people) and that has an entrance (implying it's a building). So the definition seems somewhat murky. Note that in specifically defining "household" it is distinguished from a residence. This is important because the methodology is to go to residences and ask how many from the household died, not how many in the residence died. I'll come back to this.
But elsewhere, the two terms are seemingly equivalent. From a starting residence, the interviewers "proceed[] to the adjacent residence until 40 households [are] surveyed." If these two terms are synonymous, why call out a specific definition of one? That seems curious. And why structure the survey this way? It would seem a more controlled approach to visit a residence and ask how many died in that residence. It eliminates any biases in the resulting data, to have the scope of the questions the same as the scope of the choice of the sampling. I'll come back to this.
In the interview process, the interviewee is asked about deaths in the household within the desired period. When deaths were reported, "surveyors requested to see a copy of any death certificate." This is one of the points that seem to solidify the analysis, that the reported deaths are actually verified. But again notice the precise wording of this statement.
Any death certificate. There is nothing to require that the death certificate be that of the reported decedent. That may seem like nitpicking, but is an alleged scientific paper and as such ought to be very precise and this statement is, at best, exceedingly sloppy and such sloppiness needs to be taken into consideration when analyzing the results. It is at worst a clever loophole by which the researchers can bias the numbers.
Now, why did I harp above on the sloppiness of household versus residence? It seems to me this is a way to derive exaggerated mortality numbers. In our country today, families live far apart. My father lives in Indiana, my sister in Ohio, and me in Wisconsin. I have more extended family in Florida, Maryland, and Pennsylvania, plus probably many other places with family members I've long lost touch with. So a residence and a household, depending on what their sloppy definition is trying to get at, could be pretty much the same.
But what about our country 150 years ago? You were born, lived, and died all in the same little village or town. Extended family lived in close proximity and often got together for meals and other activities. In that environment, if I visited three adjacent residences, how many households would that be? With the given murky definition, it could well be one, if they were all part of the same family, often ate together, and having homes had entrances to those homes. Let's say that in this extended household there was one person who died. Using the methodology described in this paper, how many deaths would be counted? Three. I visit each
residence and ask how many people died in the
household, so each residence would report the same death and so it would be counted three times.
My admittedly limited understanding of Middle Eastern culture is that it is structured in a similar way, with extended family all living close together, so a household could well comprise multiple close residences. In such an environment, choosing to interview in adjacent residences increases the chances of overcounting deaths, particularly with a fuzzy definition of the term "household". If the authors can't even give a clear definition of the term, how would they expect the person being interviewed to understand, particularly when cultural and linguistic differences are taken into consideration?
I'm not accusing the authors of deliberately structuring their approach to get as big a number as possible. But I have seen many people defend the study because of its scientific rigor, which I certainly question.
Update This
commenter over at Iraq the Model sums up my main point pretty clearly:
This is also an area of cultural differences. I have noticed that the more extended parts of Iraqi families tend to be bonded much closer than families in the USA. My guess is that families would count "grandfather" as a part of the household if he happened to die during the study period. -- Even if grandfather mostly lived in his own household.
The study's reliability absolutely hinges on each person in Iraq belonging to one and only one household. Failure to establish this clearly would dramatically skew the results beyond belief.
There's another somewhat related point that should be made. The basic methodology of this study is essentially a poll. As with any polling, the precise wording of a question is crucial. The study authors explain their terms, somewhat sloppily as I've said, in English. But the interviews were presumably conducted in Arabic. So a key piece of information would be to provide the precise Arabic verbiage used in the interview question, along with an explanation of the connotations of the word choices, both due to the language and the culture. The authors do not provide this information.
Update 10/13/2006 There are some more issues with the methodology of this survey. The most important is how they chose the people to be interviewed. Basically, the country was broken down into the government's administrative districts. Each district was weighted by its population and then districts were chosen at random from this weighted distribution. Because the distribution is weighted by population, districts in Baghdad are far more likely to be chosen that districts elsewhere in Iraq. Consequently, the measured mortality (629 deaths, 87% after the invasion) is more a measure of the mortality
in Baghdad than in the whole country. In order to extrapolate this measurement to the whole country, one must show that the violence is also distributed according to population, and given that the unrest in the country seems very focussed in the Sunni triangle, the distribution will be even more dominated by Baghdad than the rest of the country. Therefore the measured mortality over-emphasizes Baghdad, which will have a higher mortality rate than the rest of the country, so the results are very much over-estimated.
Furthermore choosing closely spaces residences for the polling introduces correlations to the data and undermines the randomness. People who live in the same neighborhood often spend time together. So if someone on the street is a victim of a bomb, there's a decent chance someone else will be as well, of the same bomb. So the methodology would seem to measure an increased mortality.